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were  scored  by  maximum  likelihood  estimation  using  the  three-parameter  logis- 
tic model.  A nomological  net  was  specified  describing  the  relationships  of 
the  achievement  tests  to  the  achievement  constructs  and  their  relationships 
with  the  vocabulary  construct  and  the  vocabulary  tests.  The  parameters  of  the 
net  were  estimated  by  fitting  the  observed  intercorrelations  among  the  test 
scores  to  the  nomological  net,  using  the  methodology  of  linear  structural 
equations.  Maximum  likelihood  estimates  of  the  parameters  of  the  nomological 
net  indicated  essentially  equal  validities  for  the  classroom  and  adaptive 
tests  in  four  comparisons.  , However,  the  validity  of  the  adaptive  tests  was 
effectively  higher  than  that  of  the  classroom  tests,  since  equal  validities 
were  achieved  with  from  25%  to  31%  fewer  items.  The  data  also  permitted  an 
analysis  of  the  effects  of  verbal  ability  on  achievement  test  performance, 
separately  for  the  conventional  and  adaptive  tests.  The  results  from  a con- 
firmatory maximum  likelihood  factor  analysis  showed  a larger  influence  of 
verbal  ability  on  achievement  test  performance  at  the  first  administration  of 
the  adaptive  test.  This  result  was  attributed  to  a necessity  to  learn  how  to 
use  the  computer  equipment  with  verbal  instructions,  which  may  have  further 
reduced  the  validity  of  the  adaptive  tests.  Combined  with  the  facts  that  the 
adaptive  tests  were  obtained  under  volunteer  conditions  while  the  classroom 
tests  were  obtained  under  "motivated"  grading  condit ions ,^the  results  of  this 
study  indicate  that  computer-administered  adaptive  tests  can  provide  more 
valid  measurement  of  achievement  than  conventional  paper-and-pencil  tests. 
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A Construct  Validation  of  Adaptive  Achievement  Testing 


In  the  last  decade  there  has  been  an  increasing  amount  of  research  on 
adaptive  or  tailored  ability  testing  (Weiss,  1976).  In  general,  this  research 
has  shown  that  adapting  ability  tests  to  the  individual  is  beneficial  in  terms 
of  (1)  reducing  test  anxiety  and  increasing  test-taking  motivation  (Betz  & 

Weiss,  1976)  and  (2)  providing  measurement  of  higher  precision  (McBride  & 

Weiss,  1976;  Vale,  1975).  More  recently,  interest  has  extended  to  achievement 
testing  as  well  (Bejar,  Weiss,  & Gialluca,  1977;  Bejar,  Weiss,  & Kingsbury, 

1977;  Brown  & Weiss,  1977;  Reckase,  1977).  The  question  of  the  validity  of 
adaptive  testing  has  not  yet  been  investigated,  however,  either  in  the 
ability  testing  domain  or  in  the  adaptive  measurement  of  achievement. 

A few  studies  have  examined  the  "fidelity’’  of  adaptive  testing  strategies, 
where  fidelity  is  defined  as  the  correlation  between  true  ability  level  and 
ability  level  estimated  by  an  adaptive  testing  procedure  (e.g.,  McBride  & 

Weiss,  1976;  Urry,  1976;  Vale  & Weiss,  1975).  Of  necessity,  however,  these 
studies  were  computer  simulation  studies  in  which  "true"  ability  was  known  and 
testees  were  simulated  by  a mathematical  model . Other  studies  which  have  examined 
the  validity  of  adaptive  tests  (e.g.,  Linn,  Rock,  & Cleary,  1969)  were  real- 
data  simulation  studies  in  which  responses  to  adaptive  tests  were  simulated 
from  the  responses  of  students  to  conventional  paper-and-pencil  tests.  Thus, 
no  live-testing  studies  have  been  reported  in  which  tests  were  administered 
adaptively  and  in  which  the  comparative  validity  of  conventional  paper-and- 
pencil  testing  and  adaptive  strategies  was  examined, 
i 

Narrowly  defined,  validity  consists  of  ascertaining  how  well  an  individual's 
performance  on  a criterion  of  interest  can  be  forecasted  from  knowledge  of 
his/her  test  performance  on  the  test  being  validated  (e.g.,  Cronbach,  1971). 

The  usual  procedure  for  this  kind  of  validation  consists  of  assessing  the 
relationship,  or  correlation,  of  the  scores  on  the  criterion  with  the  scores  on 
the  test  being  validated. 

When  the  interest  is  in  comparing  the  validities  of  two  or  more  testing 
procedures,  this  approach  to  validation  could  give  misleading  results,  since 
a test  consists  of  several  components,  each  of  which  can  determine  to  some 
extent  a testee's  performance  on  the  test.  As  a result,  the  correlation  between 
scores  on  tests  administered  in  different  ways  may  be  partially  determined  by 
the  components  shared  by  the  testing  procedure  being  validated  and  the  criterion 
(Bejar,  1977).  Thus,  if  the  correlation  between  scores  from  Testing  Procedure 
A and  Criterion  C is  higher  than  the  correlation  between  scores  from  Testing 
Procedure  B and  Criterion  C,  this  may  not  necessarily  be  evidence  that  Testing 
Procedure  A is  more  valid  than  Testing  Procedure  B.  The  apparent  difference 
in  validity  could  be  due  simply  to  the  fact  that  Test  A and  Criterion  C 
were  measured  under  similar  conditions  and  thus  had  more  method  variance 
in  common.  For  example,  both  the  test  and  the  criterion  performance  might  be 
measured  under  conditions  which  were  arbitrarily  high  speeded,  and  the  resulting 
correlation  would  reflect  this  common  speededness. 


A broader  and  more  appealing  view  of  the  validation  process  is  construct 
validation  (Campbell  & Fiske,  1959;  Cronbach  & Meehl,  1955).  In  this  context 
the  question  is  not  how  well  some  criterion  is  predicted;  rather,  the  goal  is 
identification  of  the  constructs  that  account  for  test  performance.  This  is 
done  by  postulating  a nomological  net — a theory  describing  the  laws  and  hypo- 
theses that  relate  observables  to  observables,  observables  to  constructs,  and 
constructs  to  constructs.  The  validation  process  then  consists  of  ascertaining 
whether  the  data  support  the  theoretical  hypotheses  in  the  nomological  net.  If 
the  data  are  in  accord  with  the  hypothesis,  the  problem  becomes  one  of  estima- 
ting the  strength  of  the  relationship  between  the  different  components  of  the 
net.  The  practical  problem  of  assessing  the  relative  validity  of  two  testing 
procedures  becomes  one  of  determining  how  well  each  measures  the  construct  it 
is  supposed  to  measure.  This  can  be  approached  by  assessing  the  relationship 
between  the  observed  scores  derived  from  each  of  the  testing  procedures  and  the 
constructs  that  the  testing  procedures  are  designed  to  measure. 


The  purpose  of  this  study  was  to  assess  the  relative  construct  validities 
of  two  testing  procedures  for  measuring  achievement. — a conventional  paper-and- 
pencil  test  and  a computer-administered  adaptive  test.  A nomological  net 
was  specified  and  fitted  to  the  intercorrelations  among  four  measures  of 
achievement  and  to  measures  of  verbal  ability.  A secondary  purpose  of  the 
study  was  to  estimate  the  relationships  of  verbal  ability  to  achievement  test 
performance. 


Method. 

Data  for  this  study  were  obtained  from  students  enrolled  in  a large 
introductory  biology  course  at  the  University  of  Minnesota  during  the  fall  and 
winter  quarters  of  the  1976-1977  school  year.  The  analysis  was  based  on 
volunteers  for  which  the  following  six  scores  were  available: 

1.  Classroom  biology  achievement  test,  first  midquarter  (MQ1C) 

2.  Classroom  biology  achievement  test,  second  midquarter  (MQ2C) 

3.  Adaptive  biology  achievement  test,  first  midquarter  (MQ1A) 

4.  Adaptive  biology  achievement  test,  second  midquarter  (MQ2A) 

5.  Adaptive  vocabulary  test  at  first  midquarter  (V0C1) 

6.  Adaptive  vocabulary  test  at  second  midquarter  (V0C2) 

The  classroom  midquarter  tests,  MQ1C  and  MQ2C,  were  the  tests  normally  given 
in  the  course  for  grading  purposes.  Data  on  both  the  adaptive  achievement 
and  vocabulary  tests  were  collected  from  students  who  volunteered  to  participate 
in  the  research  in  exchange  for  extra  points  toward  their  final  course  grade. 

Subjects 

Data  were  available  on  students  from  two  academic  quarters.  During  the 
fall  quarter,  394  students  had  volunteered  to  take  an  adaptive  midquarter  test 
based  on  the  material  from  the  first  classroom  biology  midquarter  test  and  386 
volunteered  for  the  adaptive  midquarter  test  based  on  the  material  from  the 
second  classroom  biology  midquarter  test.  However,  only  269  students 
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participated  at  both  occasions;  data  analysis  for  fall  quarter  data  was  based  on 
this  group.  For  winter  quarter,  317  students  volunteered  to  participate  in  the 
first  adaptive  midquarter  test  administration  and  349  volunteered  to  participate 
in  the  second;  data  analysis  for  winter  quarter  data  was  based  on  the  230 
students  who  participated  in  both  adaptive  midquarter  tests. 

Procedure 

At  both  the  first  and  second  adaptive  test  administrations,  the  volunteer 
students  were  first  given  the  adaptive  multiple-choice  verbal  ability  test 
(V0C1,  V0C2)  followed  by  the  adaptive  multiple-choice  biology  test  (MQ1A, 

MQ2A)  based  on  the  content  covered  in  the  classroom  biology  midquarter  tests. 

The  adaptive  tests  were  administered  by  means  of  cathode  ray  terminals  (CRTs) 
connected  to  a Hewlett-Packard  real-time  computer  system.  Instructional 
screens  explaining  the  operation  of  the  equipment  were  presented  prior  to 
testing  (DeWitt  & Weiss,  1974).  A proctor  was  present  in  the  testing  room  at 
all  times  to  assist  students  with  the  equipment.  Each  test  item  was  presented 
separately  at  the  rate  of  960  characters  per  second  on  the  CRT  screen. 

Students  responded  by  pressing  the  key  corresponding  to  the  chosen  alternative. 
During  the  fall  quarter  administration,  feedback  was  provided  after  each 
response  (i.e. , each  student  was  informed  whether  or  not  he/she  had  answered 
each  test  item  correctly);  if  an  incorrect  answer  was  given,  the  student  was 
told  which  answer  was  correct.  During  the  winter  quarter  administration, 
immediate  feedback  was  not  provided.  There  were  no  time  limits  imposed  on  the 
tests.  At  the  completion  of  testing,  students  received  a printed  report  which 
listed  questions  answered  incorrectly  and  provided  the  correct  answers. 

The  classroom  biology  achievement  test  data  (MQ1C,  MQ2C)  were  obtained 
from  course  instructors. 

Achievement  Tests 

Item  pool.  The  development  of  the  item  pools  used  in  this  study  has  been 
described  by  Bejar,  Weiss,  and  Kingsbury  (1977).  Briefly,  the  answer  sheets  for 
two  classroom  biology  midquarter  tests  from  two  previous  academic  quarters 
were  used  as  raw  data  for  obtaining  the  item  parameters — discrimination  (a) , 
difficulty  (h) , and  guessing  (c) — of  the  logistic  item  characteristic  curve 
(Birnbaum,  1968)  for  each  item.  For  the  fall  quarter  administration,  114  items 
covering  the  content  of  the  first  midquarter  were  available;  the  pool 
covering  the  content  of  the  second  midquarter  contained  112  items.  For  the 
winter  administration,  44  items  were  added  to  the  first  midquarter  pool  and 
49  were  added  to  the  second  midquarter  pool;  thus,  there  were  a total  of  158 
items  in  the  first  midquarter  item  pool  and  a total  of  161  in  the  second 
midquarter  pool.  Both  the  adaptive  and  classroom  achievement  tests  were 
constructed  from  the  same  item  pool. 

Adaptive  achievement  tests.  The  adaptive  achievement  tests  were  admin- 
istered by  the  stradaptive  strategy  (Weiss,  1973).  The  entry  point  was  selected 
based  on  student-reported  GPA.  At  the  beginning  of  the  adaptive  testing  session, 
students  were  asked  to  state  their  grade  point  average  (GPA)  by  selecting  one 
of  nine  equally  spaced  intervals  from  2.00  to  4.00  (DeWitt  & Weiss,  1974,  p. 

49).  For  example,  students  reporting  GPAs  in  the  lowest  interval  began  testing 
in  the  least  difficult  stratum,  whereas  students  choosing  the  highest  GPA 
interval  began  in  the  most  difficult  stratum. 
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The  branching  strategy  used  in  the  stradaptive  test  was  the  standard 
"up-one/down-one"  procedure.  That  is,  if  an  item  was  answered  incorrectly  or 
with  a the  next  unadministered  item  from  the  next  easier  stratum  was 

administered;  if  an  item  was  answered  correctly,  the  next  unadrainistered  item 
from  the  next  more  difficult  stratum  was  administered. 

A variable  criterion  was  used  to  terminate  testing  on  the  stradaptive 
test.  After  a student  answered  five  items  in  a stratum,  if  he/she 
answered  20%  or  fewer  correctly,  testing  was  terminated.  If  testing  was  not 
terminated  by  this  criterion  after  50  items  had  been  administered,  no  further 
items  were  administered. 

To  construct  item  pools  which  could  be  used  for  administration  of  stradap- 
tive tests,  each  of  the  two  pools  (Midquarters  1 and  2)  was  structured  by 
forming  nine  strata  of  increasing  difficulty.  Mean  stratum  difficulties  were 
chosen  so  that  there  would  be  approximately  the  same  number  of  items  per 
stratum.  Within  each  stratum  the  items  were  ordered  in  terms  of  their 
discriminations  unless  this  resulted  in  items  covering  the  same  content  area 
appearing  consecutively.  Appendix  Tables  A and  B show  the  item  difficulties 
and  discriminations  for  items  in  the  nine  strata  into  which  the  first  and  second 
midquarter  item  pools  were  structured.  Table  1 summarizes  that  information  by 
showing  the  mean  and  standard  deviations  of  the  discrimination  (a),  difficulty 
(b) , and  guessing  (c)  parameter  estimates  for  the  fall  and  winter  item  pools. 

For  both  the  first  and  second  midquarter  tests,  the  mean  discriminations, 
difficulties,  and  "guessing"  parameters  were  essentially  identical  for  the  two 
quarters. 

Table  1 


Mean  and  Standard  Deviation  of  Item  Parameter  Estimates  of  the 
Fall  and  Winter  Item  Pools  for  the  First  and  Second  Adaptive 
Achievement  Midquarter  Tests  (MQ1A  and  MQ2A) 


Test 

Number 

of 

Items 

a 

Discrimination 
Mean  S.D. 

b 

Difficulty 
Mean  S.D. 

a 

"Guessing" 

Mean  S.D. 

MQ1A 

Fall 

114 

1.21 

.46 

.19 

1.21 

.27 

.08 

Winter 

158 

1.20 

.44 

.16 

1.19 

.27 

.08 

MQ2A 

Fall 

112 

1.20 

.41 

.16 

1.16 

.28 

.09 

Winter 

161 

1.20 

.39 

.11 

1.16 

.28 

.08 

Clasuvoom  aahiever.',ent  tests.  The  classroom  biology  midquarter  test  each 
quarter  included  55  items  which  the  course  staff  selected  by  a combination  of 
pedagogical  criteria  and  procedures  from  classical  test  theory.  Their  aim  in 
constructing  these  tests  was  to  produce  a "good"  test  for  purposes  of  course 
grading.  Students  were  instructed  to  answer  50  items  of  their  choice.  For 
purposes  of  this  research,  however,  the  classroom  achievement  tests  were 
shorter  than  50  items,  since  item  parameter  estimates  were  not  available  for 
some  of  the  items.  The  item  parameter  estimates  for  the  items  in  MQ1C  and 
MQ2C  for  the  fall  administration  are  in  Appendix  Table  C;  those  for  the  winter 
administration  are  in  Appendix  Table  D. 
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Table  2 shows  the  means  and  standard  deviations  of  estimates  of  the  three 
item  parameters  for  MQ1C  and  MQ2C  for  the  fall  and  winter  administrations. 
Constrasting  these  figures  to  those  in  Table  1,  it  is  evident  that  the  items 
for  MQ1C  were,  on  the  average,  less  discriminating  than  those  in  the  adaptive 
test  pool;  the  items  in  MQ1C  were  also  less  discriminating  than  those  in  the 
adaptive  test  pool,  but  the  differences  between  the  two  pools  were  smaller. 

Table  2 

Mean  and  Standard  Deviation  of  Item  Parameter  Estimates  for 
the  First  and  Second  Classroom  Achievement  Midquarter  Tests 


Number 

a 

b 

c 

of 

Discrimination 

Difficulty 

"Guessing" 

Test 

Items 

Mean 

S.D. 

Mean 

S.D. 

Mean 

S.D. 

MQ10 

Fall 

39 

1.09 

.27 

.11 

1.14 

.29 

.06 

Winter 

45 

1.09 

. 31 

.08 

1.  33 

.25 

.09 

MQ2C 

Fall 

41 

1.17 

.44 

.07 

1.20 

.28 

.07 

Winter 

44 

1.14 

.40 

-.06 

1.29 

.25 

.08 

Adaptive  Vocabulary  Tests 

The  adaptive  vocabulary  test  was  also  administered  by  the  stradaptive 
strategy.  The  same  entry  point  and  termination  rule  used  in  the  biology 
achievement  test  were  used  for  the  vocabulary  test,  except  that  the  maximum 
number  of  items  in  the  vocabulary  test  was  set  at  40. 

The  development  of  the  vocabulary  item  pool  has  been  described  by  McBride 
and  Weiss  (1974) ; the  procedures  for  estimating  the  item  parameters  used  for 
the  vocabulary  tests  are  described  in  Prestwood  and  Weiss  (1977).  For  the  fall 
administration,  the  same  pool  consisting  of  321  items  was  used  for  the  first  and 
second  midquarters.  During  winter  quarter,  however,  the  pool  was  split  into 
two  comparable  halves  consisting  of  160  and  161  items  each,  used  for  the  first 
and  second  midquarter  administrations,  respectively.  Appendix  Table  E provides 
the  item  parameters  for  the  stradaptive  vocabulary  tests. 

Scoring 

All  tests  were  scored  by  maximum  likelihood  estimation,  specifying 
Birnbaum's  (1968)  three-parameter  logistic  model  as  the  response  model.  The 
item  parameter  estimates  were  edited  by  the  scoring  program  so  that  the 
maximum  value  of  the  discrimination  parameter  (a)  was  set  to  2.5,  the  maximum 
absolute  value  of  the  difficulty  parameter  (fc)  was  set  to  3.00,  and  the 
maximum  value  of  the  guessing  parameter  (e)  was  set  to  .35.  In  estimating 
achievement  scores,  omitted  items  were  ignored  in  the  computations.  The 
convergence  criterion  was  set  to  .0001,  and  a maximum  of  50  iterations  was 
allowed  in  the  maximum  likelihood  scoring. 


re*" 
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Nomologiaal  Net 

The  nomological  net  investigated  consisted  of  three  constructs,  each 
measured  twice  (see  Figure  1) — achievement  at  the  first  midquarter  (ACH1) , 
achievement  at  the  second  midquarter  (ACH2) , and  verbal  ability  (VER) . ACH1 
and  ACH2  were  each  measured  once  by  the  classroom  biology  achievement  midquar- 
ter tests  (MQlf  MQ2C)  and  once  by  the  adaptive  biology  achievement  midquarter 
tests  (MQ1A,  MQ2A) . VER  was  also  measured  twice — once  during  the  administration 
of  MQ1A  and  once  during  the  administration  of  MQ2A.  The  arrows  connecting 
the  constructs  and  the  constructs  with  their  observable  measures  symbolize 
the  parameters  of  the  nomological  net  to  be  estimated.  Thus,  Figure  1 postu- 
lates that  verbal  ability  (VER)  influenced  achievement  at  the  first  midquarter 
(ACH1)  and  achievement  at  the  second  midquarter  (ACH2).  Achievement  at 
the  second  midquarter  (ACH2)  in  turn  was  hypothesized  to  be  influenced  both  by 
achievement  at  the  first  midquarter  (ACH1)  and  by  verbal  ability  (VER). 


Figure  1 

Nomological  Net  for  Construct  Validation 
of  Classroom  and  Adaptive  Achievement  Tests 


For  construct  validation  comparisons  of  the  adaptive  and  conventional  paper 
and-pencil  achievement  tests,  the  parameters  of  interest  were  those  that  estimated 
the  relationships  between  the  observables  and  their  corresponding  constructs 


(A^  through  A^) . These  parameters  may  be  referred  to  as  the  validities  of  the 

observable  achievement  scores.  Thus,  in  the  context  of  Figure  1 the  major 
purpose  of  this  study  was  to  compare  the  validities  for  the  adaptive  achievement 
tests  (A^  and  A^)  with  the  validities  for  the  conventional  classroom  paper-and- 

pencil  achievement  tests  (A^  and  A^)  in  two  independent  sets  of  data. 

The  nomological  net  in  Figure  1 also  focuses  on  the  effects  of  verbal 
ability  on  biology  achievement  at  both  midquarters  (y^  and  y^)  and  on  the 

dependence  of  achievement  at  the  second  midquarter  on  achievement  at  the  first 
midquarter  (8).  This  part  of  the  model  is  relevant  from  a substantive  point 
of  view  because  it  indicates  the  degree  to  which  assimilation  of  instruction 
is  dependent  on  verbal  ability.  From  a psychometric  point  of  view,  however, 
the  effects  of  verbal  ability  on  achievement  test  performance  are  equally 
important,  since  individual  differences  in  verbal  ability  could  possibly 
affect  the  validity  of  the  achievement  scores,  particularly  when  the  method  of 
administration  was  different  in  the  two  testing  procedures  (i.e.,  the  adaptive 
test  was  computer  administered  and  the  classroom  test  was  paper-and-pencil) . 
Thus,  a second  objective  of  this  investigation  was  to  assess  the  influence  of 
verbal  ability  on  test  performance  under  the  two  modes  of  administration. 

Data  Analysis  Methodology 

Estimating  the  vavar.eters  of  the  nonologioal  net.  Traditionally , 
construct  validation  hypotheses  have  been  partially  investigated  by  factor 
analytic  techniques.  However,  in  recent  years  the  methodology  of  linear 
structural  equations  (Goldberger  & Duncan,  1973)  has  been  applied  to  these 
kinds  of  questions  (e.g.,  Schmitt,  1978)  as  a result  of  computational 
developments  due  primarily  to  Joreskog  (e.g.,  Joreskog  & van  Thillo,  1972). 
Structural  equations  methodology  is  a more  general  analytic  technique  than 
factor  analysis,  but  it  is  very  much  related  to  it.  In  general,  a structural 
equations  model  consists  of  three  parts.  One  of  these  parts  models  the 
interrelationships  among  the  endogenous  or  dependent  variables.  The  second 
part  models  the  interrelationships  among  the  exogenous  or  independent  variables. 
The  modeling  of  both  sets  of  variables  is  by  means  of  factor  analytic  models; 
that  is,  it  is  assumed  that  the  interrelationships  within  the  dependent  and 
independent  variable  sets  can  be  accounted  for  by  a factor  analytic  model 
Finally,  the  third  part  of  the  structural  equations  model  connects  the 
constructs  or  factors  derived  separately  from  the  dependent  and  independent 
variables. 

The  application  of  this  methodology  to  a nomological  net  such  as  that 
shown  in  Figure  1 has  been  discussed  by  Joreskog  and  Sorbom  (1976);  the 
following  discussion  utilizes  their  notation.  To  construct  the  mode]  he 
nomological  net  can  be  separated  into  the  three  parts  indicated  above. 

The  first  part,  the  factor  model  for  the  dependent  variables  in  the  nomological 
net  of  Figure  1,  is  seen  in  Equation  1: 


-8- 


This  is  simply  an  orthogonal  two-factor  model  for  the  four  biology  achievement 
scores  (MQ1C,  MQ2C,  MQ1A,  MQ2A) . The  two  factors  postulated  were  achievement 
in  biology  at  the  first  midquarter  (ACH1)  and  at  the  second  midquarter  (ACH2) . 
The  e^'s  are  the  unique  components  associated  with  each  observable  measure. 

For  estimation  purposes  A^  and  A^  were  set  in  the  estimation  program  to 

1.0,  while  A^  and  A^  were  free  to  take  on  any  values.  The  loadings  of  MQ1C 

and  MQ1A  (A  and  A^)  were  fixed  at  1.0  in  order  to  make  the  model  identified, 

that  is,  to  insure  the  uniqueness  of  each  parameter  estimate.  The  uniqueness 
variances,  a2c  , were  also  estimated  by  the  program. 

The  second  part  of  the  model  describing  the  structures  of  the  independent 
variables  is  given  by  Equation  2: 


(=:)■(:;) 


(VER)  + 


Equation  2 indicates  that  performance  on  the  vocabulary  tests  is  accounted  for 
by  the  single  construct,  verbal  ability  (VER).  For  purposes  of  estimation.  A,, 
was  set  to  1.0  in  order  to  make  the  model  identified.  Thus,  the  parameters 

to  be  estimated  were  A,  and  a2r  and  a2,-  . 

6 e5  ^6 

Finally,  the  third  part  of  the  model  relates  the  two  achievement  con- 
structs of  biology  (ACH1,  ACH2)  and  verbal  ability  (VER).  This  relationship 
was  postulated  to  be 


' 1 o' 

_3  1_ 


(VER)  + 


The  parameters  to  be  estimated  in  this  part  of  the  model  were  3,  which  indicates 
the  strength  of  the  relationship  between  achievement  at  two  points  in  time; 
and  Y2,  which  indicate  the  strength  of  the  relationship  of  verbal  ability  with 

achievement  at  the  first  midquarter  and  at  the  second  midquarter;  and  finally, 
the  variances  of  the  residuals,  and  C2- 

Expanding  on  Equation  3,  ACH1  and  ACH2  can  be  expressed  as 

ACH1  = Y VER  + C [4] 

ACH2  = (y2  - BYj_)  VER  + (?2  - 3^)  . [5] 

ACH1  is  the  sum  of  two  effects,  verbal  ability  (VER)  and  Cj  * a residual 

component.  ACH2,  on  the  other  hand,  is  a function  of  verbal  ability,  achieve- 
ment at  the  first  midquarter,  and  a residual  - 3^).  Note  that  if  6=0 — that 

is,  ACH1  has  no  effect  on  ACH2 — Equation  5 reduces  to 
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ACH2  = Y VER  + C2  . [6] 

It  was  assumed  that  the  expected  value  of  ACH1,  ACH2,  and  VER  was  zero. 

The  expected  value  of  the  e.'s  was  also  zero.  The  e.'s  were  assumed  to  be 

V Is 

uncorrelated  and  independent  among  and  between  themselves  and  uncorrelated  with 
ACH1,  ACH2,  and  VER.  The  residuals  (i.e.,  ?2)  were  also  assumed  to  be 

uncorrelated  and  to  have  a mean  of  zero.  In  addition  to  these  assumptions,  it 
was  assumed  that  the  joint  distribution  of  the  observed  variables  was  multi- 
variate normal  and  that  the  sample  size  was  large;  therefore,  maximum  likelihood 
estimates  of  the  parameters  in  Equations  1 through  3 could  be  obtained  by 
using  the  program  LISREL  (Joreskog  & van  Thillo,  1972). 

Estimating  the  influence  of  verbal  ability  on  test  perfomanee.  The 
nomological  net  described  in  Figure  1,  which  was  postulated  to  account  for 
achievement  in  biology,  did  not  allow  the  estimation  of  the  effect  of  verbal 
ability  on  achievement  test  performance.  The  role  of  this  type  of  method 
variance  analysis  in  the  validation  process  has  been  recognized  since  Campbell 
and  Fiske's  formalization  of  the  multitrait-multimethod  matrix  (1959). 

However,  precise  methods  for  estimating  the  proportion  of  method  variance  did 
not  become  available  until  the  development  of  maximum  likelihood  factor 
analysis  (Boruch  & Wolins,  1970;  Joreskog,  1974).  This  methodology 
was  used  to  estimate  the  effects  of  verbal  ability  on  achievement  test 
performance. 

An  orthogonal  factor  model  was  postulated  to  account  for  the  interrelation- 
ships among  the  six  observed  scores.  The  pattern  matrix  associated  with  the 
proposed  model  is  shown  in  Table  3.  An  "X"  indicates  that  the  variable  was 
permitted  to  load  on  the  factor.  A "0"  indicates  that  a variable  was  not 
permitted  to  load  on  the  factor.  Setting  certain  loadings  to  zero  permits  the 
definition  of  "clean"  factors,  while  at  the  same  time  it  introduces  restric- 
tions in  the  estimation  procedure  which  are  necessary  to  insure  that  the 
model  as  a whole  is  identified. 


Table  3 

Factor  Model  Postulated  to  Account 
for  Variation  Among  Six  Observed  Scores 


Factor 


Variable 

i 

II 

III 

IV 

MQ1A 

X 

X 

X 

0 

MQ2A 

X 

X 

0 

X 

MQ1C 

X 

X 

X 

0 

MQ2C 

X 

X 

0 

X 

V0C1 

X 

0 

0 

0 

V0C2 

X 

0 

0 

0 

Note. 

An  "X" 

means  the  corresponding 

parameter 

is  ’ 

free"  to 

take 

any 

value. 

A 

"0" 

indicates 

the 

parameter 

is  " 

fixed"  to 

take 

the  val 

ue 

0. 

* 
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The  model  in  Table  3 allows  the  identification  of  four  influences  on  the 
observed  scores  or  sources  of  variance.  The  first  source  of  variance  may  be 
called  a verbal  ability  factor,  since  it  was  the  only  factor  on  which  the 
verbal  scores  (V0C1  and  V0C2)  were  allowed  to  load.  The  loadings  of  the  four 
achievement  scores  on  this  factor  indicate  the  effect  of  verbal  ability  on 
achievement  test  performance.  The  second  factor  may  be  called  an  achievement 
factor  because  only  the  four  achievement  scores  (MQ1A,  MQ2A,  MQ1C,  MQ2C)  were 
allowed  to  load  on  it.  The  third  and  fourth  factors  are  "occasions"  factors 
because  they  capture  the  unique  variability  associated  with  the  first  and 
second  midquarter  tests,  respectively,  MQ1A,  MQ1C  and  MQ2A,  MQ2C. 

Models  such  as  that  shown  in  Table  3 can  only  be  estimated  with  factor 
analysis  programs  which  permit  restricted  solutions.  A number  of  such  programs 
exist.  The  program  ACOVS  (Joreskog,  Gruvaeus,  & van  Thillo,  1970)  was  used  in 
these  analyses.  This  program  obtains  maximum  likelihood  estimates  of  each  of 
the  loadings  under  the  usual  stochastic  assumptions  of  factor  analysis.  If  the 
sample  size  is  large  and  the  data  are  multivariately  distributed,  the  measure  of 
fit  computed  by  the  programs  is  distributed  as  a y2  variable  with  known  degrees 
of  freedom. 

Data  Analysis 


b’ubjeot  pool.  During  the  fall  and  winter  administrations,  269  and  213 
students,  respectively,  had  completed  all  six  tests.  However,  data  for  some 
students  were  eliminated  from  all  analyses  for  one  of  two  reasons:  (1)  If  the 
scoring  procedure  failed  to  converge  on  any  one  of  the  six  scores,  that  student 
was  eliminated  from  the  analyses;  (2)  If  a student's  maximum  likelihood  score 
on  the  adaptive  test  was  too  "discrepant”  from  the  classroom  test  maximum 
likelihood  score,  the  student  was  eliminated  from  the  analyses.  Specifically, 
the  difference  in  each  student's  maximum  likelihood  scores — MQ1C-MQ1A  and  MQ2C- 
MQ2A — was  computed.  If  the  absolute  value  of  either  score  difference  was 
2.00  or  larger,  the  student  was  excluded.  Invariably,  the  difference  was 
positive  for  the  students  eliminated,  which  indicated  that  the  student  performed 
on  the  adaptive  achievement  test  two  units  below  his/her  classroom  achievement 
test  performance.  The  rationale  for  excluding  such  students  was  that  they 
probably  were  not  "doing  their  best"  taking  the  adaptive  achievement  test, 
since  it  was  a volunteer  situation.  After  excluding  students  for  either  of 
these  two  reasons,  there  were  213  and  187  students,  respectively,  who  had  taken 
all  six  tests  during  fall  and  winter  administrations.  The  analyses  and  results 
that  follow  are  based  on  these  students  only. 

Distributional  analysis . An  assumption  needed  to  obtain  maximum  likelihood 
estimates  of  the  parameters  by  fitting  the  structural  model  to  a correlation 
matrix  is  that  the  distribution  of  the  scores  be  multivariate  normal.  Although 
some  procedures  for  testing  multivariate  normality  exist  (e.g.,  Andrews, 
Gnanadesikian,  & Warner,  1973),  they  are  not  easily  implemented.  For  that 
reason,  the  univariate  normality  of  each  score  was  investigated  instead.  If 
the  multivariate  distribution  of  a set  of  "ores  is  normal,  it  would  follow  that 
the  component  scores  are  each  also  normally  distributed.  However,  demonstrating 
that  each  score  is  normally  distributed  does  not  guarantee  that  the  joint 
distribution  of  all  scores  will  be  multivariate  normal. 

The  univariate  normality  of  each  of  the  scores  was  tested  by  means  of  the 
Kolmogorov-Smirnov  statistic  (see,  e.g.,  Lindgren,  1968).  According  to  this 


test,  if  the  observed  cumulative  frequency  exceeds  the  theoretically  expected 
frequency  by  a certain  amount,  the  hypothesis  that  the  distribution  is  normally 
distributed  is  rejected.  The  statistic  is 

D = MAX  | FQ(x)  - Fe(x)  | [7] 

where 

D is  the  absolute  value  of  the  maximum  discrepancy, 

Fq(x)  is  the  observed  cumulative  frequency  of  x,  and 

F_,(x)  is  the  expected  cumulative  frequency  of  x. 

The  null  hypothesis  was  tested  at  the  .05  level  for  each  variable  in  each 
quarter. 

Results 

Distributional  Analysis 

The  results  of  application  of  the  Kolgomorov-Smirnov  test  to  the  maximum 
likelihood  scores  on  each  of  the  six  tests  is  shown  for  the  fall,  winter,  and 
combined  data  in  Table  4.  All  six  scores  were  judged  normally  distributed  in 
each  quarter  and  in  the  combined  data.  As  can  be  seen,  the  probability  of  the 
null  hypothesis  was  high  in  every  instance,  with  a minimum  value  of  p=.17 
for  the  V0C2  data  in  the  fall  and  combined  groups.  Thus,  the  results  lend 
support  to  the  assumption  that  the  joint  distribution  of  observable  scores  may 
be  multivariately  normally  distributed. 

Table  4 

Results  of  the  Kolgomorov-Smirnov  Test  of  Normality 


for 

Fall, 

Winter,  and 

Combined 

Groups 

Test 

Group  and  Statistic 

MQ1C 

MQ2C 

MQ1A 

MQ2A 

V0C1 

V0C2 

Fall  (AN 213) 

Maximum  Discrepancy 

-.04 

-.03 

.05 

-.04 

-.05 

-.07 

Probability 

.94 

.99 

.74 

.81 

.62 

.17 

Winter  (AN187) 

Maximum  Discrepancy 

-.06 

.04 

.05 

-.04 

-.05 

-.07 

Probability 

.56 

.98 

.63 

.83 

.71 

.30 

Combined  Groups  (F=400) 

Maximum  Discrepancy 

-.04 

-.03 

-.04 

-.04 

-.04 

-.06 

Probability 

.63 

.92 

.59 

.55 

.59 

.17 

Test  Score  Intercorrelations 

Estimates  of  the  parameters  in  Figure  1 were  obtained  by  fitting  the  model 
to  a correlation  matrix.  Thus,  the  first  step  toward  that  goal  was  the  computa- 
tion of  the  intercorrelations  among  the  six  maximum  likelihood  scores.  These 
intercorrelations,  along  with  the  means  and  standard  deviations  of  each  score. 
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are  shown  in  Table  5 for  the  fall  and  winter  data  separately  and  combined.  In 
general,  the  variabilities  of  the  classroom  achievement  test  scores  (MQ1C  and 
MQ2C)  were  higher  than  the  variabilities  of  the  corresponding  adaptive  achievement 
test  scores  (MQ1A,  MQ2A) . This  suggests  that  the  volunteers  were  more  homogeneous 
with  respect  to  achievement  than  was  the  class  as  a whole.  Another  contrast 
seen  in  Table  5 is  that  the  mean  achievement  scores  on  the  classroom  tests  were 
higher  than  the  corresponding  means  for  the  adaptive  tests.  Since  the  adaptive 
achievement  test  was  taken  anywhere  between  one  day  and  three  weeks  after  the 
classroom  achievement  test,  this  may  indicate  that  some  forgetting  took  place. 

An  alternative  explanation  for  the  lower  means  on  the  adaptive  achievement  tests 
is  that  the  students  were  less  motivated  to  perform  to  their  full  capabilities 
on  the  adaptive  test;  scores  on  the  adaptive  achievement  test  did  not  count 
toward  their  course  grades,  while  their  grades  were  based  on  scores  from  the 
classroom  tests. 


Table  5 

Means  and  Standard  Deviations  and  Intercorrelat ion  Matrices  of 
Six  Scores  for  Fal 1,  Winter , and  Combined  Data 


Group 

and  Test 


Test 

Mean 

S.D. 

MQ1C 

MQ2C 

MQ1A 

MQ2A 

V0C1 

Fall  (/V=213) 

MQ1C 

.551 

1.028 

MQ2C 

.434 

.898 

.699 

MQ1A 

.024 

.883 

.741 

.665 

MQ2A 

-.048 

.874 

.665 

.748 

.692 

V0C1 

-.454 

.966 

.230 

.239 

.335 

.246 

V0C2 

-.329 

.967 

.274 

.277 

.375 

.278 

.890 

Winter  (/V=187) 

MQ1C 

.529 

.975 

MQ2C 

.438 

.904 

.610 

MQ1A 

-.120 

.915 

.782 

. 586 

MQ2A 

.014 

.815 

.619 

.768 

.629 

V0C1 

-.473 

.983 

.387 

.408 

.376 

.378 

V0C2 

-.418 

1.052 

.371 

.349 

.346 

.331 

.851 

Combined  (/7=400) 

MQ1C 

.541 

1.000 

MQ2C 

.436 

.900 

.662 

MQ1A 

-.043 

.900 

.758 

.625 

MQ2A 

-.019 

.847 

.644 

.756 

.657 

V0C1 

-.463 

.973 

.302 

. 319 

. 354 

.305 

V0C2 

-.  371 

1.001 

.320 

.311 

. 362 

.300 

.870 

As  expected, 

the  intercorrelation 

i matrices 

show 

that  the 

achievement  test 

scores  were  more 

highly  correlated  among  themselves 

than  they 

were  with 

the 

vocabulary  scores 

. Within 

the  achievement  data 

, the 

highest 

correlations  in 

all  three  matrices  were  between  tests 

taken  on 

the  same  material  (i.e.. 

MQlA  and 

MQ1C , and  MQ2A  and  MQ2C) . 
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Nomologioal  Net  Analysis 

Validity  of  classroom,  and  adaptive  achievement  tests.  The  results  of 

fitting  the  validity  model  to  the  fall  data  are  shown  in  Table  6.  The  \2 
reported  at  the  bottom  is  a measure  of  the  overall  fit  of  the  model  to  the  data. 

Table  6 

Standardized  Maximum  Likelihood  Parameter  Estimates  of  Achievement 


Parameter 

Description 

Estimate 
Fall  Winter 

K 

Validity  of  classroom  biology  achievement  test 

9 

1 

first  midquarter  (MQ1C) 

.853 

.893 

Validity  of  classroom  biology  achievement  test 

9 

Z 

second  midquarter  (MQ2C) 

.866 

.868 

K 

Validity  of  adaptive  biology  achievement  test. 

first  midquarter  (MQ1A) 

.869 

.876 

X, 

Validity  of  adaptive  biology  achievement  test. 

second  midquarter  (MQ2A) 

.864 

.884 

X, 

Validity  of  adaptive  vocabulary  test  at  first 

midquarter  (V0C1) 

.890 

.972 

X, 

Validity  of  adaptive  vocabulary  test  at  second 

D 

midquarter  (V0C2) 

.999 

.876 

3 

Regression  of  achievement  at  second  midquarter 

(ACH2) 

on  achievement  at  first  midquarter  (ACH1) 

.925 

.734 

Yi 

Regression  of  achievement  at  first  midquarter 

(ACH1) 

on  verbal  ability  (VER) 

.380 

.447 

Y9 

Regression  of  achievement  at  second  midquarter 

(ACH2) 

z 

on  verbal  ability  (VER) 

-.031 

.123 

c. 

Variance  of  residuals  for  achievement,  first 

1 

midquarter  (ACH1) 

.855 

.800 

Variance  of  residuals  for  achievement,  second 

Z 

midquarter  (ACH2) 

.165 

.361 

0 

Variance  of  verbal  ability  (VER) 

1.000 

1.000 

-2 

° ci 

Error  variance  for  MQ1C 

.521 

.451 

Error  variance  for  MQ2C 

.496 

.482 

o2e3 

Error  variance  for  MQ1A 

.500 

.496 

°2C4 

Error  variance  for  MQ2A 

.504 

.467 

«’c5 

Error  variance  for  V0C1 

.456 

.236 

Error  variance  for  V0C2 

.000 

.483 

X2 

8.39 

4.04 

df 

6 

6 

P 

.21 

.67 
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A better  fit  of  the  model  to  the  data  is  indicated  by  higher  values  of  p,  the 
probability  of  the  observed  X2  value.  In  this  particular  case,  the  probability 
was  .21,  indicating  an  adequate  fit  of  the  model  to  the  fall  data.  This  may  be 
considered  evidence  in  favor  of  the  validity  of  the  nomological  net  postulated 
earlier.  However,  to  determine  whether  the  adaptive  or  classroom  tests  were 
more  valid  measures  of  achievement  requires  examination  of  the  values  of  the 
parameter  estimates. 

The  first  four  lines  of  Table  6 show  the  standardized  loadings  (validities) 
of  the  four  achievement  measures  on  their  respective  constructs.  For  the  first 
midquarter  in  the  fall  group, the  coefficient  (A^ ) was  .853  for  the  classroom 

achievement  test  (MQ1C) ; for  the  adaptive  achievement  test  (MQ1A),  the  coeffi- 
cient (A^)  was  .869.  The  corresponding  data  for  the  second  midquarter  in  the 

fall  group  (A^  and  A^)  were  .866  for  MQ2C  and  .864  for  MQ2A. 

The  last  column  of  Table  6 shows  the  results  for  winter  data.  Again,  the 
fit  statistic  at  the  bottom  of  the  table  indicated  that  the  nomological  net 
postulated  for  these  data  was  a reasonable  summary  of  the  intercorrelations 
among  the  six  scores.  Moreover,  for  the  winter  data  the  fit  was  better  than 
for  the  fall  data  (p=. 67  vs.  .21). 

The  validity  coefficients  for  the  four  biology  achievement  tests  (A^ 

through  A^)  indicated  that  for  the  winter  data  the  first  classroom  midquarter 

test  (MQ1C)  was  slightly  more  valid  than  the  corresponding  adaptive  midquarter 
test  (A^=.893  vs.  A^=.876).  This  was  a reversal  of  the  findings  with  fall  data 

where  the  adaptive  midquarter  test  was  found  to  be  more  valid  (A^=.853  vs. 

A^=.869).  However,  for  winter  data,  the  second  adaptive  midquarter  test  was 

more  valid  than  the  classroom  counterpart  (A^=.884  vs.  A2=.868),  whereas  for 

fall  both  testing  procedures  were  found  to  be  about  equally  valid  (A  =.864  vs. 

A2=.866). 

Table  7 summarizes  the  construct  validity  correlations  in  Table  6 and 
provides  information  on  the  average  numbers  of  items  in  the  classroom  and 
adaptive  biology  achievement  tests.  As  Table  7 shows,  both  testing  procedures 
achieved  essentially  equal  validities  in  both  quarters.  However,  in  both  cases 
the  adaptive  achievement  tests  achieved  essentially  the  same  level  of  validity 
with  considerably  fewer  items,  on  the  average.  For  the  fall  data,  the  average 
length  of  the  first  adaptive  achievement  midquarter  test  was  24.1  items,  while 
that  of  the  first  classroom  achievement  midquarter  test  was  35  items;  the 
difference  of  11  items  represents  a reduction  of  31%  in  the  length  relative  to 
the  classroom  achievement  test  with  a slight  increase  in  validity.  For  the 
other  three  tests,  reductions  due  to  adaptive  achievement  testing  were  27% 
and  25%  for  both  winter  tests,  again  with  essentially  no  differences  in 
validities. 

Thus,  the  adaptive  achievement  test  was  effectively  more  valid,  since 
it  required  fewer  items  to  yield  scores  as  valid  as  the  classroom  achievement 
test.  However,  it  may  be  noted  that  the  adaptive  achievement  tests  were  drawn 
from  item  pools  with  a higher  mean  discrimination  than  the  items  in  the 


» 


i 
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Table  7 

Construct  Validity  Correlations  (r)  for 
Classroom  and  Ad aptive  Biology  Achievement  Tests 


Classroom  Adaptive 


Group  Average 

and  Test  No.  Items 

r 

Average 
No.  Items 

r 

Fall  Quarter  (77=213) 

First  Midquarter  35.0 

.853 

24.1 

.869 

Second  Midquarter  37.0 

.866 

27.2 

.864 

Winter  Quarter  (77=187) 

First  Midquarter  40.4 

.893 

30.4 

.876 

Second  Midquarter  40.2 

.868 

30.0 

.884 

classroom  achievement  tests.  This,  however,  seems  to  be  an  inherent  advantage 
of  the  adaptive  achievement  testing  procedure  and  not  an  unfair  one.  Additional 
research  comparing  adaptive  and  conventional  paper-and-pencil  achievement  tests 
will  be  necessary  to  determine  whether  the  effectively  higher  validity  of 
adaptive  tests  was  due  to  the  higher  average  item  discriminations  or  to  the 
process  of  adapting  the  test  to  each  student. 

Other  parameters  of  the  nomologieal  net.  Table  6 also  shows  the  estimated 
regression  (3)  of  achievement  at  the  second  midquarter  (ACH2)  on  achievement 
at  the  first  midquarter  (ACH1) , for  both  the  fall  and  winter  data.  For  the 
fall  data  this  coefficient  was  very  high  (.925),  suggesting  that  subsequent 
achievement  was  largely  determined  by  previous  achievement.  For  the  winter 
data  the  regression  coefficient  was  3=.734,  which  suggested  a decrease  in  the 
influence  of  ACH1  on  ACH2 ; but  since  these  are  standardized  estimates,  that 
conclusion  may  not  be  completely  justified  because  of  possibly  different 
variabilities  in  achievement  between  the  two  quarters. 

The  regression  coefficients  ( y ^ , Y,)  of  the  achievement  constructs  (ACH1 

and  ACH2)  on  verbal  ability  for  both  fall  and  winter  data  are  also  shown  in 
Table  6.  For  the  fall  data,  achievement  at  the  first  midquarter  (ACH1)  seemed 
to  be  more  influenced  by  verbal  ability  (y^=.380)  than  achievement  at  the  second 

midquarter  (Y2=-.031).  Since  the  regression  of  ACH2  on  VER  in  a partial 

regression  weight,  the  fact  that  it  was  close  to  zero  indicates  that  verbal 
ability  did  not  influence  achievement  at  the  second  midquarter  beyond  the 
influence  that  it  exerted  through  ACH1.  The  amount  of  achievement  variance 
that  remained  unexplained  after  taking  into  consideration  verbal  ability  is 
indicated  by  the  residua]  variances  of  ACH1  and  ACH2,  ^ and  £ . Since  the 

solution  was  standardized,  these  data  can  be  interpreted  directly  as  propor- 
tions of  variance.  Thus,  for  ACH1  most  of  the  variance  (85%)  remained 
unexplained  in  this  model.  The  other  15%  was  explained,  in  this  case,  by 
verbal  ability.  By  contrast,  for  ACH2,  the  proportion  left  unexplained  was 
only  17%,  i.e.,  verbal  ability  and  achievement  at  the  first  midquarter 
accounted  for  83%  of  the  variance. 


As  was  true  of  the  fall  data,  in  the  winter  data  verbal  ability  had  a 
moderate,  but  somewhat  larger,  influence  on  achievement  at  the  first  midquarter. 
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Thls  was  reflected  in  the  residual  variance  of  ACH1  (4^),  which  was  80%  as 

compared  with  85%  for  the  fall  data.  Thus,  verbal  ability  accounted  for  5% 
more  variance  of  ACH1  in  the  winter  data  than  in  the  fall  data.  There  was, 
on  the  other  hand,  an  increase  in  the  winter  data  in  the  proportion  of  the 
unexplained  variance  of  ACH2  (^  ) • In  the  fall  data  that  proportion  was  17%; 
in  the  winter  data  it  was  36%. 

Effect  of  Verbal  Ability  on  Achievement  Test  Perfonr.ance 

Table  8 shows  the  maximum  likelihood  estimates  of  the  factor  pattern 
matrix  associated  with  the  four-factor  model  postulated  to  account  for  the 
intercorrelations  among  the  six  tests  for  the  fall  and  winter  data  combined. 
The  x2  statistic  of  6.15  with  1 degree  of  freedom  (p=.013)  suggests  that  the 
fit  was  statistically  not  very  good.  However,  the  residual  correlation 
matrix  (i.e.,  the  difference  between  the  observed  correlation  matrix  and  the 
reproduced  correlation  matrix  computed  using  the  solution  in  Table  8)  was 
nearly  zero  with  the  largest  residual  correlation  being  -.014,  which  suggests 
an  adequate  fit  of  the  data  to  the  model. 


Table  8 

Maximum  Likelihood  Solution  for  Four-Factor  Model 
for  Fall  and  Winter  Data  Combined  (A;=400) 


Factor 


Test 

I 

Ability 

II 

Achieve- 

ment 

III 

Occasion 

1 

IV 

Occasion 

2 

MQ1C 

.334 

.750 

.329 

.0 

MQ2C 

.337 

.726 

.0 

.334 

MQ1A 

.384 

.703 

.334 

.0 

MQ2A 

.324 

.739 

.0 

.331 

V0C1 

.928 

.0 

.0 

.0 

V0C2 

.937 

.0 

.0 

.0 

Note. 

X2=6.15;  df= 

1;  p=. 013 

The  variance  component  estimates  derived  from  the  solution  in  Table  8 are 
shown  in  Table  9.  These  were  obtained  by  squaring  the  corresponding  loadings. 
The  first  row  of  Table  9 shows  the  proportion  of  performance  variance  in  each 
test  accounted  for  by  verbal  ability.  For  the  two  classroom  achievement  mid- 
quarter tests  (MQ1C  and  MQ2C),  the  proportion  was  .11.  For  the  first  adaptive 
achievement  midquarter  (MQ1A),  that  proportion  was  .15;  and  for  the  second 
adaptive  achievement  midquarter  (MQ2A),  it  was  .10. 

The  second  row  of  Table  9 shows  the  proportion  of  variance  due  to  achieve- 
ment in  biology.  For  the  first  and  second  classroom  achievement  midquarter 
tests  (MQ1C  and  MQ2C)  and  the  second  adaptive  achievement  midquarter  (MQ2A) , 
between  53%  and  55%  of  the  variance  was  due  to  biology  achievement.  For  the 
first  adaptive  achievement  midquarter,  the  corresponding  percentage  was  49%. 

The  next  two  rows  show  the  proportion  of  occasion-specific  variance  associated 
with  the  four  achievement  tests.  In  all  cases,  that  proportion  was  .11. 
Finally,  the  last  row  shows  the  proportion  of  variance  unaccounted  for  in  each 
test,  which  was  essentially  constant  for  each  of  the  achievement  tests. 
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Table  9 

Variance  Components  for  Fall  and  Winter  Data  Combined  (#=400) 


Test 


Source  of  Variance 

MQ1C 

MQ2C 

MQ1A 

MQ2A 

V0C1 

V0C2 

Verbal  Ability 

.11 

.11 

.15 

.10 

.86 

.88 

Achievement 

.55 

.53 

.49 

.55 

.00 

.00 

Occasion  1 

.11 

.00 

.11 

.00 

.00 

.00 

Occasion  2 

.00 

.11 

.00 

.11 

.00 

.00 

Residual 

.23 

.25 

.25 

.24 

.14 

.12 

Total 

1.00 

1.00 

1.00 

1.00 

1.00 

1.00 

Discussion  and  Conclusions 


The  focus  of  the  study  was  to  assess  the  validities  of  two  testing  pro- 
cedures— conventional  paper-and-pencil  and  adaptive — in  the  context  of  a 
meaningful  nomological  net  or  model  of  the  achievement  process.  This  model 
was  illustrated  in  Figure  1 and  found  to  fit  data  from  two  academic  quarters 
very  well.  In  the  context  of  this  model,  validity  was  indexed  by  the  loading 
of  observed  performance  on  the  corresponding  achievement  construct.  It  was 
found  that  out  of  four  comparisons,  the  adaptive  procedure  was  somewhat  more 
valid  in  two  cases,  equally  valid  in  one,  and  somewhat  less  valid  in  another. 
However,  in  all  instances,  the  adaptive  procedure  was  at  least  25%  shorter  on 
the  average  than  the  conventional  paper-and-pencil  testing  procedure.  Thus, 
in  a practical  sense,  the  adaptive  testing  procedure  was  considerably  more 
valid  in  all  instances. 

While  these  results  demonstrate  the  increased  efficiency  of  adaptive 
testing  in  practical  situations,  the  results  also  raise  questions  of  a 
theoretical  nature.  Previous  results  reported  by  Bejar,  Weiss,  and 
Gialluca  (1977)  indicated  that  the  adaptive  test  provided  higher  levels  of 
information  than  did  the  conventional  paper-and-pencil  test,  even  though 
the  adaptive  test  was  shorter  on  the  average.  The  substantial  differences  in 
information  in  favor  of  the  adaptive  testing  procedure  would  lead  to  the 
expectation  that  the  scores  from  the  adaptive  testing  procedure  would 
likewise  be  substantially  more  valid  while  at  the  same  time  reducing  test 
length.  However,  this  expectation  was  not  totally  fulfilled.  This  might  have 
resulted  from  the  presence  of  situational  factors  during  the  administration  of 
the  adaptive  test  which  were  not  present  during  the  classroom  paper-and-pencil 
administration. 

One  such  factor  was  identified  in  the  present  study — namely,  the  larger 
influence  of  verbal  ability  in  the  first  adaptive  test  administration.  The 
results  from  the  confirmatory  factor  analysis  helped  in  understanding  the 
findings  from  the  nomological  net  analysis  with  respect  to  the  validity  of 
adaptive  and  conventional  paper-and-pencil  achievement  testing  scores  by  corro- 
borating the  differential  influence  of  verbal  ability  on  test  performance. 

The  data  showed  that  performance  on  the  first  adaptive  achievement  midquarter 
test  (MQ1A)  was  more  dependent  on  verbal  ability  than  was  performance  on  the 
other  achievement  tests.  This  may  have  been  due  to  the  fact  that  learning 
to  properly  operate  the  testing  equipment  was  dependent  to  some  extent  on 
verbal  ability.  By  contrast,  the  occasion-specific  influence  on  each  of  the 
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achievement  tests  was  the  same.  This  suggests  that  the  increased  influence  of 
verbal  ability  on  the  first  adaptive  achievement  midquarter  test  reduced  the 
role  that  achievement  could  otherwise  have  played.  As  a result,  the  validity 
of  the  fir~t  adaptive  achievement  midquarter  test  reported  earlier  was 
probably  underestimated. 

The  net  ..asult  of  this  situational  difference  between  the  adaptive  and 
conventional  classroom  paper-and-pencil  test  administrations  may  have  been  to 
introduce  a bias  into  the  achievement  estimates  and  corresponding  information. 
That  is,  the  item  characteristic  curves  (ICC)  derived  from  conventional  class- 
room paper-and-pencil  administration  of  a test  (which  was  used  to  parameterize 
the  items  used  in  the  adaptive  administration)  may  not  have  been  an  accurate 
portrayal  of  the  relationship  between  performance  on  a test  item  and  achieve- 
ment the  first  time  the  item  was  administered  by  computer. 

This  situational  vulnerability  of  the  ICC  model  may  be  surprising  in  view 
of  the  "invariant"  nature  of  ICC  models.  However,  the  invariance  property  of 
ICC  models  pertains  to  populations  responding  to  a test  under  similar  circum- 
stances. There  is  nothing  in  the  theory  to  suggest  that  the  model  is 
situationally  invariant.  Whether  this  is  the  case  or  not  is  a matter  of 
empirical  test.  In  the  present  study,  not  only  was  the  medium  of  administration 
different  but  so  were  the  motives  for  taking  the  test.  That  is,  the  adaptive 
test  data  were  obtained  on  volunteers,  while  the  classroom  test  data  were 
used  for  grading  purposes.  In  view  of  these  differences,  the  expectation  that 
the  adaptive  procedure  would  be  substantially  more  valid  may  have  been  unreal- 
istic. 

It  is  clear  from  this  discussion  that  further  validation  studies  of 
adaptive  testing  should  be  careful  to  equate  as  much  as  possible  the  conditions 
of  administration.  Specifically,  the  appropriateness  of  ICCs  derived  under 
circumstances  different  from  those  surrounding  adaptive  testing  should  be 
carefully  evaluated. 

The  focus  of  this  investigation  has  been  on  the  psychometric  properties 
of  adaptive  and  conventional  paper-and-pencil  testing;  however,  because  of  the 
construct  validation  approach,  the  results  presented  here  seem  to  have  relevance 
to  a larger  question — namely,  the  identification  of  some  of  the  components 
underlying  competence  and  achievement  (see  Glaser,  1976).  Historically, 
construct  validation  has  played  a minor  role  in  the  achievement  testing  field. 

One  reason  for  this  is  that  users  of  achievement  tests,  as  well  as  some  psycho- 
metricians (e.g. , Shoemaker,  1975)  are  primarily  concerned  with  content  and 
predictive  validity.  Their  orientation  is  behavioristic;  the  question  they 
ask  is,  what  can  this  individual  do?  Tests  which  address  this  question  are 
called  criterion-referenced  tests  (Glaser,  1963;  Glaser  & Nitko,  1971;  see 
Hambleton,  Swaminathan,  Algina,  & Coulson,  1978,  for  a recent  review);  however, 
Messick  (1975)  argues  persuasively  that  tests  must  also  be  construct  referenced. 
That  is,  to  fully  understand  test  scores,  the  processes,  attributes,  and  traits 
determining  test  performance  must  be  understood. 

Since  verbal  ability  is  an  indicator  of  information-processing  efficiency 
in  short-term  memory  (Hunt,  Lunneborg,  & Lewis,  1975;  Glaser,  1976),  the 
results  of  this  study  give  an  indication  of  the  influence  this  cognitive 
mechanism  has  on  achievement,  at  least  within  this  course.  Knowledge  of  the 
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cognitive  mechanism  underlying  achievement  would  seem  to  be  a prerequisite  to 
adaptive  instruction.  For  instance,  Glaser  (1976)  has  suggested. 

They  [tests]  will  have  to  assess  performance  attainments 
and  capabilities  that  can  be  matched  to  available 
educational  options  in  more  detailed  ways  than  can  be 
carried  out  with  currently  used  testing  and  assessment 
procedures.  (Glaser,  1976,  p.  21) 

The  role  of  achievement  testing  in  this  broader  context  is  to  provide  informa- 
tion relevant  to  instructional  decisions  about  an  individual  in  an  instructional 
course.  The  results  of  the  present  study  have  demonstrated  that  adaptive 
testing  can  fulfill  that  assignment  more  efficiently  than  conventional 
paper-and-pencil  testing. 
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APPENDIX:  SUPPLEMENTARY  TABLES 


Table  A 

Item  Number,  Discrimination  (<2),  Difficulty  (3),  and  Guessir.g  (a) 
Parameters  for  Items  in  the  Midquarter  1 Stradaptive  Item  Pool 


Item 

a 

£ 

a 

b 

— 

— . . 

a 

£ 

2 

Stratum  9 

Stratum  6 

Stratum  3, 

con  t . 

(15  items) 

(19  items) 

3215 

1.59 

-.82 

.23 

3209 

2.50 

2.29 

.29 

3047 

1.66 

. 44 

.29 

3011 

1.32 

-.86 

.20 

3417 

2.50 

3.00 

.35 

3079* 

1.61 

.27 

.35 

3435* 

.83 

-.61 

.35 

3033 

1.54 

2.44 

.35 

3213 

.93 

.52 

.35 

3216 

1.27 

-.62 

.18 

3440* 

1.52 

2.00 

.30 

3041 

1.51 

.23 

.35 

3054* 

1.29 

-.93 

.31 

3251 

2.50 

2.39 

.35 

3062* 

1.47 

.43 

.30 

3221 

1.25 

-.52 

.17 

3406 

1.31 

2.48 

.35 

3405 

1.40 

.55 

.32 

3049 

1.15 

-.71 

.18 

3045 

1.02 

2.48 

.27 

3445* 

1.19 

.44 

.34 

3255 

1.14 

-.72 

.26 

3242 

.94 

2.40 

.35 

3218 

.82 

.58 

.12 

3067* 

1.07 

-.76 

.21 

3407 

1.02 

2.41 

.29 

3019 

1.31 

.29 

.29 

3246 

1.10 

-.72 

.28 

3263* 

.99 

2.29 

.35 

3207 

.70 

.46 

.28 

3022 

1.01 

-.48 

.30 

3241 

.91 

2.09 

.17 

3431 

.70 

.28 

.34 

3272* 

1.06 

-.81 

.35 

3414 

.88 

2.29 

.32 

3000 

1.24 

.52 

.35 

3017 

.99 

-.58 

.16 

3402 

.83 

2.44 

.35 

3046 

1.18 

.24 

.22 

3076* 

.94 

-.73 

.21 

3247 

.82 

2.42 

.35 

3042 

1.15 

.37 

.27 

3224 

.80 

-.50 

.37 

3228 

.67 

2.49 

.31 

3050 

1.13 

.35 

.18 

Mean  (F) 
Mean* (W) 

Stratum  2 

1.26 

1.22 

-.65 

-.68 

.20 

.22 

Mean  (F) 
Mean*(W) 

1.34 

1.33 

2.43 

2.39 

.32 

.32 

3066 

3034 

3262 

1.05 

1.01 

.81 

.53 

.37 

.47 

.31 

.28 

.35 

Stratum  8 

3438 

.70 

.21 

.27 

(20  items) 

(20  items) 

2.50 

1.13 

.40 

.28 

3023 

2.40 

-1.15 

.35 

3409 

1.28 

.00 

Mean*(W) 

1.14 

.40 

.29 

3202 

1.81 

-.99 

.21 

3234 

2.50 

1.73 

.00 

3415 

.85 

-.96 

.35 

3018 

.89 

1.25 

.35 

Stratum  5 

3245 

1.34 

-.96 

.21 

3204 

1.14 

1.66 

.35 

(15  items) 

3236 

1.26 

-1.20 

.33 

3422 

1.47 

1.50 

.35 

3282* 

2.06 

-.02 

.35 

3020 

1.23 

-1.28 

.17 

3411 

1.36 

1.23 

.35 

3220 

1.79 

-.03 

.26 

3028 

1.12 

-1.26 

.35 

3250 

.91 

1.94 

.29 

3005 

1.43 

.11 

.35 

3226 

1.09 

-.98 

.20 

3206 

.74 

1.51 

.21 

3425 

1.36 

.17 

.23 

3210 

1.04 

-1.22 

.35 

3410 

1.30 

1.34 

.31 

3053 

1.12 

.12 

.09 

3239 

1.04 

-1.13 

.21 

3429 

1.25 

1.24 

.28 

3214 

1.12 

.03 

.23 

3013 

1.00 

-.97 

.35 

3419 

1.23 

1.48 

.25 

3412 

1.12 

.19 

.35 

3267* 

1.02 

-1.22 

.23 

3421 

1.17 

1.15 

.35 

3051 

1.29 

.21 

.28 

3257 

.98 

-1.02 

.25 

3436* 

1.12 

1.59 

.35 

3279* 

.99 

.01 

.28 

3070* 

.95 

-1.28 

.22 

3271* 

.95 

1.32 

.30 

3403 

.99 

.18 

.19 

3036 

.92 

-1.18 

.16 

3061* 

.95 

1.57 

.30 

3069* 

.88 

-.01 

.35 

3014 

.86 

-1.24 

.14 

3427 

.92 

1.51 

.26 

3211 

.88 

.01 

.13 

3060* 

.86 

-1.31 

.29 

3449* 

.91 

1.26 

.14 

3002 

.82 

.13 

.14 

3274* 

.85 

-1.05 

.26 

3063* 

.91 

1.51 

.35 

3426 

.68 

.07 

.22 

3238 

.82 

-1.06 

.21 

3074* 

.84 

1.  79 

.35 

3423 

.66 

. 16 

.27 

3032 

.77 

-1.06 

.27 

3420 

.68 

1.62 

.35 

Mean  (F) 

1.11 

.11 

.22 

Mean  (F) 

1.16 

-1.10 

.26 

Mean  (F) 

1.29 

1.46 

.26 

Mean*(W) 

1.15 

.09 

.24 

Mean* (W) 

1.11 

-1.13 

.26 

Mean*(W) 

1.19 

1.47 

.27 

Stratum  4 

Stratum  1 

Stratum  7 

(13  items) 

(17  items) 

(20  items) 

3256 

2.31 

-.33 

.26 

3077* 

2.50 

-1.39 

.20 

3408 

2.50 

1.05 

.31 

3430 

1.15 

-.30 

.29 

3027 

1.67 

-1.38 

.35 

3437 

1.95 

.66 

.28 

3031 

1.47 

-.33 

.35 

3443* 

1.07 

-1.64 

.35 

3258 

1.24 

.81 

.35 

3254 

3.38 

-.17 

.22 

3249 

.91 

-1.69 

.17 

3432 

1.72 

.67 

.35 

3237 

1.54 

-.37 

.18 

3428 

.90 

-1.56 

.35 

3048 

1.35 

.66 

.33 

3404 

.65 

-.29 

.35 

3073* 

1.43 

-1.57 

.31 

3413 

1.40 

.76 

.35 

3244 

1.35 

-.44 

.23 

3205 

1.25 

-1.53 

. 19 

3448* 

1.40 

.73 

.30 

3058* 

1.05 

-.43 

.35 

3078* 

1.24 

-1.65 

.35 

3439* 

1.36 

.64 

.32 

3240 

.98 

-.28 

.15 

3057* 

1.20 

-1.35 

.26 

3219 

1.23 

.62 

.21 

3268* 

.97 

-.28 

.18 

3065* 

1.17 

-1.66 

.35 

3072* 

1.02 

.65 

.32 

3208 

.76 

-.16 

.12 

3235 

1.15 

-1.40 

.28 

3277* 

1.00 

1.04 

.35 

3006 

.77 

-.  37 

.33 

3029 

1.13 

-1.50 

.28 

3035 

.90 

.68 

.28 

3259 

.69 

-.41 

.20 

3201 

1.07 

-1.34 

.23 

3433 

1.35 

.86 

.30 

3008 

.96 

-1.75 

.18 

3447* 

1.  18 

.93 

. 32 

Mean  (F) 

1.27 

-.31 

.25 

3252 

. 79 

-1.77 

.35 

3064* 

.94 

.86 

.24 

Mean* (W) 

1 .21 

-.32 

.25 

3003 

.96 

-1.76 

.34 

3230 

.90 

.87 

.35 

Stratum  3 

3044 

.87 

-1.42 

.15 

3444* 

3012 

.88 
. 75 

.78 

.80 

.35 

.35 

(19  items) 
3021 

1.96 

-.49 

.21 

Mean  (F) 
Mean* (W) 

1.06 

1.19 

-1.55 

-1.55 

.26 

.28 

3260 

.71 

. 84 

.28 

3217 

1.06 

-.48 

. 14 

3056* 

.71 

.89 

.26 

1052 

1.71 

-.93 

.00 

Mean  (F) 

1.28 

.78 

.31 

3055* 

1.71 

-.65 

.24 

Mean*(W) 

1.22 

.79 

.31 

Note . Items  with  asterisks  are  those  which  were  added  to  the  pool  Winter  quarter.  All 
other  items  were  in  the  pool  both  Fall  and  Winter  quarters. 
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Table  B 

Item  Number,  Discrimination  (.z),  Difficulty  (b) , and  Guessing  (c) 
Parameters  for  Items  in  the  Midquarter  2 Stradaptive  Lten  Pool 


Item 

a 

b 

c 

Item 

a 

£ 

C 

Item 

a 

b 

c 

Stratum  9 

Stratum  6 

Stratum  3 

(18  items) 

(20  items) 

(17  items) 

3831 

2.  50 

1.96 

.06 

3707* 

2.75 

.55 

31 

3b34 

1,79 

-.58 

.30 

3690 

2.50 

2.36 

.24 

3746* 

1.59 

.43 

30 

3739* 

1.68 

-.61 

.35 

3833* 

2.  50 

2.85 

.35 

3806 

1.57 

.48 

35 

3809 

1.27 

-.61 

.35 

3909 

2.45 

1.48 

.28 

3925* 

1.14 

.48 

35 

3924* 

1.13 

-.79 

.18 

3805 

2.50 

2.38 

.35 

3658 

1.24 

. 32 

35 

3672 

1.57 

-.80 

.15 

3698 

2.  11 

2.82 

.35 

3905 

.98 

.35 

20 

3737* 

1.41 

66 

.34 

3901 

1.55 

2.62 

.35 

3738* 

1.34 

.40 

35 

3915 

1.08 

-.61 

.16 

3835* 

1.21 

2.28 

.35 

3605 

1.22 

. 57 

34 

3640 

1.43 

-.69 

.35 

3620 

2.04 

2.97 

.35 

3815 

.95 

.58 

35 

3906 

.37 

-.66 

.14 

3697 

1.56 

3.00 

.35 

3611 

1.22 

.39 

32 

3812 

.82 

-.63 

.13 

3810 

.92 

2.20 

.27 

3675 

1.21 

.40 

28 

3682 

1.33 

-.72 

.34 

3664 

1.11 

1.60 

.35 

3820 

.92 

.38 

12 

3637 

1.29 

-.  73 

.28 

3625 

.98 

1.66 

.35 

3665 

1 .19 

. 54 

22 

3636 

1.24 

-.63 

.27 

3622 

.95 

2.53 

.35 

3709* 

1.19 

.30 

35 

3641 

1.20 

-.65 

.22 

3841* 

.87 

2.13 

.35 

3729* 

1.14 

.37 

30 

3711* 

1.05 

-.56 

.35 

3651 

.95 

2.30 

.35 

3819 

. 76 

. 53 

35 

3608 

1.04 

-.78 

. 16 

3728* 

.91 

2.  55 

.35 

3918* 

.66 

.35 

23 

3705* 

.87 

-.58 

.14 

3712* 

.75 

1.64 

.30 

3614 

.79 

.46 

35 

3923* 

.63 

.38 

31 

Mean  (F) 

1.24 

-.67 

.24 

Mean  (F) 

1.70 

2.31 

.31 

3626 

.65 

.52 

25 

Mean*(W) 

1.25 

-.66 

.25 

Mean*(W) 

1.58 

2.30 

.32 

Mean  (F) 

1.06 

.46 

29 

Stratum  2 

Stratum  8 

Mean* (W) 

1.11 

.44 

30 

(20  items) 

(18  items) 

3735* 

1.63 

-.94 

.35 

3615 

1.69 

1.17 

.29 

Stratum  5 

3648 

1.59 

-.96 

.33 

3916 

1.39 

1.14 

.35 

(15  items) 

3807 

1.52 

-1.10 

.17 

3673 

1.51 

1.11 

.31 

3792* 

1.89 

.27 

35 

3907 

1.43 

-1.08 

.35 

3804 

.95 

1.42 

.35 

3745* 

1.  58 

-.07 

20 

3704* 

1.39 

-1.13 

.23 

3733* 

1.24 

1.40 

.35 

3720* 

1.45 

.26 

29 

3655 

1.37 

-.90 

.35 

3719* 

1.18 

1.08 

.31 

3607 

1.38 

.09 

35 

3813 

1.20 

-.97 

.17 

3921* 

.91 

1.23 

.29 

3811 

1.15 

.22 

35 

3919* 

1.30 

-.98 

.21 

3827 

.87 

1.35 

.35 

3908 

1.25 

.07 

31 

3680 

1.33 

-1.01 

.16 

3716* 

1.14 

1.14 

.27 

3649 

1.32 

.11 

22 

3808 

.99 

-1.00 

.30 

3642 

1.11 

1.11 

.24 

3632 

1.23 

.27 

35 

3b86 

1.26 

-.88 

.29 

3902 

.73 

1.49 

.29 

3718* 

1.22 

. 16 

33 

3721* 

1.23 

-1.20 

.22 

3627 

1.03 

1.07 

.35 

3629 

1.11 

-.03 

35 

3821 

.90 

-.92 

.35 

3681 

1.03 

1.54 

.35 

3732* 

.96 

-.01 

35 

3679 

1.21 

-.94 

.17 

3676 

.89 

1.51 

.25 

3633 

• 9s 

-.08 

35 

3685 

1.19 

-1.01 

. 16 

3644 

.88 

1.25 

.35 

3609 

.78 

. 18 

35 

3668 

.97 

-.87 

.14 

3717* 

.83 

1.25 

.35 

3730* 

.75 

.01 

10 

3684 

.86 

-.85 

.14 

3670 

.80 

1.11 

.35 

3618 

.64 

-.05 

00 

3703* 

.83 

-1.16 

.21 

3647 

.79 

1.14 

.35 

3617 

. 79 

-1.  11 

. 14 

Mean  (F) 

1.08 

.09 

29 

3713* 

. 75 

-1.18 

.33 

Mean  (F) 

1.05 

1.26 

.32 

Mean* (W) 

1.17 

.09 

28 

Mean*(W) 

1.05 

1.25 

.32 

Mean  (F) 

1.19 

-.97 

.23 

Stratum  4 

Mean* (W) 

1.  19 

-1.01 

.24 

Stratum  7 

(19  items) 

(15  items) 

3744* 

1 .94 

-.35 

30 

Stratum  1 

3793* 

2. 14 

.68 

.32 

3708* 

1.62 

-.20 

16 

(19  items) 

3661 

1.90 

.68 

.32 

3631 

1.53 

t.  18 

35 

3741* 

1.63 

-1.56 

.35 

3674 

1.72 

.63 

.26 

3819 

1.26 

-.32 

35 

3910 

1.58 

-1.59 

.21 

3909 

1.39 

.77 

.35 

3903 

1.21 

-.43 

31 

3692 

1.53 

-1.28 

.35 

3662 

1.54 

.93 

.27 

3671 

1.51 

-.14 

26 

3825 

1.09 

-1.38 

.34 

3654 

1.51 

.84 

.21 

3701 

.82 

-.15 

35 

3639 

1.47 

-1.80 

.35 

3669 

1.45 

.70 

.32 

364  3 

1.40 

-.50 

25 

3638 

1.35 

-1.54 

.21 

3623 

1.42 

.74 

.31 

3914 

.98 

-.39 

16 

3913 

1.31 

-1.31 

.19 

3912 

.95 

.70 

. 19 

3693 

1.13 

-.24 

24 

3837* 

1.09 

-1.59 

.25 

3734* 

.89 

.96 

.35 

3725* 

1.09 

-.52 

24 

3715* 

1.16 

-1.63 

.26 

3 700 

.84 

.85 

.30 

3710* 

1.02 

-.33 

30 

3920* 

1.12 

-1.34 

.23 

3659 

1.37 

.67 

.29 

3653 

.83 

-.51 

33 

3842* 

1.01 

-1.55 

.35 

3635 

1.17 

.66 

.35 

3660 

.78 

-.39 

14 

3695 

1.09 

-1.73 

.22 

3612 

1.12 

.75 

.35 

3922* 

. 64 

-.26 

30 

3731* 

1.05 

-1.67 

.35 

3 616 

.86 

. 62 

.25 

3606 

.71 

-.22 

14 

3832 

.99 

-1.7* 

.32 

3663 

.69 

-.17 

33 

3838* 

.99 

-1.68 

.35 

Mean  (F) 

1.32 

.73 

.29 

3696 

.68 

-.  35 

00 

3613 

.86 

-1.74 

. 33 

Mean*(W) 

1.35 

.75 

.30 

3656 

.63 

-.31 

34 

3683 

.85 

-1.31 

. 14 

3657 

.81 

-1.74 

.35 

Mean (F) 

1.01 

-.31 

25 

3610 

.80 

-1.33 

.14 

Mean* (W) 

1.08 

-.31 

26 

Mean  (F) 

1.14 

-1.54 

.26 

Mean* (W) 

1.15 

-1.55 

.28 

Note.  Items  with 

asterisks  .are 

those  which  weie 

added  to 

the 

pool  Winter 

quarter.  All 

othei 

items  were 

in  the 

poo  1 h 

oth  Fall  and  Winter  quarto 

rs . 

Table  C 

Item  Discrimination  (a).  Difficulty  (b) , and  Guessing  (a) 
Parameters  for  Classroom  Tests  MQ1C  and  MQ2C  in  Fall  Quarter 


MQ1C 

MQ2C 

Item  No. 

a 

b 

Q 

Item  No. 

a 

b 

o 

3060 

.86 

-1.31 

.29 

3922 

. 64 

-.26 

.30 

3067 

1.07 

-.76 

.21 

3904 

2.45 

1.58 

.28 

3065 

1.17 

-1.66 

.35 

3918 

.66 

.35 

.23 

3056 

.71 

.89 

.26 

3921 

.91 

1.23 

.29 

3063 

.91 

1.51 

.35 

3919 

1.30 

-.98 

.21 

3073 

1.43 

-1.57 

.31 

3920 

1.12 

-1.34 

.23 

3058 

1.05 

-.43 

.35 

3923 

.63 

.38 

.31 

3274 

.85 

-1.05 

.26 

3924 

1.13 

-.79 

.18 

3271 

.95 

1.32 

.30 

3801 

.80 

-.17 

.35 

3055 

1.71 

-.65 

.24 

3841 

.87 

2.13 

.35 

3072 

1.02 

.65 

.32 

3838 

.99 

-1.68 

.35 

3057 

1.20 

-1.35 

.26 

3833 

2.50 

2.85 

.35 

3064 

.94 

.86 

.24 

3837 

1.09 

-1.59 

.25 

3069 

.88 

-.01 

.35 

3835 

1.21 

2.28 

.35 

3054 

1.29 

-.93 

.31 

3641 

1.20 

-.65 

.22 

3066 

1.05 

.53 

.31 

3708 

1.62 

-.20 

.16 

3268 

.97 

-.28 

.18 

3718 

1.22 

.16 

.33 

3267 

1.02 

-1.22 

.23 

3728 

.91 

2.55 

.35 

3272 

1.06 

-.81 

.35 

3665 

1.19 

.54 

.22 

3070 

.95 

-1.28 

.22 

3730 

.75 

.01 

.10 

3008 

.96 

-1.75 

.18 

3719 

1.18 

1.08 

.31 

3019 

1.31 

.29 

.29 

3705 

.87 

-.58 

.14 

3062 

1.47 

.43 

.30 

3713 

.75 

-1.18 

.33 

3061 

.95 

1.57 

.30 

3703 

.83 

-1.16 

.21 

3262 

.81 

.47 

.35 

3709 

1.19 

.30 

.35 

3263 

.99 

2.29 

.35 

3707 

1.75 

.55 

.31 

3447 

1.18 

.93 

.32 

3721 

1.23 

-1.20 

.22 

3443 

1.07 

-1.64 

.35 

3717 

.83 

1.25 

.35 

3438 

.70 

.21 

.27 

3715 

1.16 

-1.63 

.26 

3448 

1.40 

.73 

.30 

3716 

1.14 

1.14 

.27 

3435 

.83 

-.61 

.35 

3720 

1.45 

.26 

.29 

3439 

1.36 

.64 

.32 

3744 

1.94 

-.35 

.30 

3436 

1.12 

1.59 

.35 

3745 

1.58 

-.07 

.20 

3449 

.91 

1.26 

.14 

3746 

1.59 

.43 

.30 

3440 

1.52 

2.00 

.30 

3711 

1.05 

-.56 

.35 

3437 

1.95 

. 66 

.28 

3710 

1.02 

-.33 

.30 

3427 

.92 

1.51 

.26 

3724 

1.14 

. 37 

.30 

3445 

1.19 

.44 

.34 

3725 

1.09 

-.52 

.24 

3444 

.88 

.78 

.35 

3731 

1.05 

-1.67 

.35 

3712 

.75 

1.64 

.30 

3704 

1.39 

-1.13 

.23 

Mean 

1.09 

.11 

.29 

Mean 

1.17 

.07 

.28 

-26- 
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Table  D 

Item  Discrimination  (a).  Difficulty  (2?) , and  Guessing  (e)  Parameters 
for  Classroom  Tests  MQ1C  and  MQ2C  in  Winter  Quarter 


MQ1C 

MQ2C 

Item  No. 

a 

b 

o 

Item  No. 

a 

b 

a 

3287 

.85 

-1.28 

.13 

3750 

.93 

-1.79 

.34 

3292 

.68 

1.39 

.35 

3926 

'.93 

-1.56 

.16 

3219 

1.23 

.62 

.21 

3845 

1.71 

.26 

.29 

3290 

1.16 

-.57 

.20 

3763 

1.23 

1.95 

.28 

3214 

1.12 

.03 

.23 

3762 

1.97 

-1.56 

.17 

3268 

.97 

-.28 

.18 

3772 

.74 

-.84 

.35 

3289 

1.14 

-1.45 

.35 

3759 

.99 

-.14 

.21 

3293 

.96 

-1.30 

.14 

3768 

1.11 

-1.55 

.17 

3291 

.65 

.52 

.35 

3756 

1.10 

-.21 

.28 

3249 

.91 

-1.69 

.17 

3749 

1.05 

-1.77 

.22 

3083 

1.05 

-.90 

.13 

3757 

1.18 

-1.60 

.18 

3090 

1.48 

-1.65 

.18 

3755 

1.03 

-.12 

.16 

3054 

1.29 

-.93 

.31 

3747 

1.11 

-1.69 

.18 

3084 

1.22 

-1.06 

.15 

3753 

.91 

-.55 

.17 

3092 

.98 

-.65 

.15 

3654 

1.51 

.84 

.21 

3082 

1.05 

2.27 

.35 

3673 

1.51 

1.11 

.31 

3011 

1.32 

-.86 

.20 

3716 

1.14 

1.14 

.27 

3095 

.79 

-1.20 

.12 

3700 

.84 

.85 

.30 

3085 

1.16 

-1.81 

.35 

3773 

1.69 

1.62 

.27 

3423 

.66 

.16 

.27 

3748 

.85 

1.31 

.35 

3453 

1.19 

.48 

.22 

3766 

1.12 

1.41 

.35 

3456 

1.03 

2.71 

.35 

3760 

1.28 

-1.58 

.18 

3454 

1.10 

2.66 

.35 

3758 

.89 

-1.45 

.15 

3460 

1.99 

1.59 

.34 

3703 

.83 

-1.16 

.21 

3452 

.75 

1.98 

.31 

3853 

1.05 

.12 

.17 

3406 

1.31 

2.48 

.35 

3854 

1.03 

-.19 

.31 

3461 

.94 

1.51 

.35 

3852 

.69 

-1.78 

.35 

3457 

.90 

1.87 

.28 

3850 

.89 

1.83 

.35 

3459 

.84 

-.29 

.26 

3851 

.76 

.18 

.23 

3407 

1.02 

2.41 

.29 

3752 

1.24 

-.50 

.19 

3458 

1.46 

-1.10 

.15 

3769 

1.15 

-.39 

.16 

3432 

1.72 

.67 

.35 

3751 

.80 

1.91 

.35 

3455 

.96 

-.61 

.31 

3770 

2.50 

1.73 

.00 

3420 

.68 

1.62 

.35 

3622 

.95 

2.53 

.35 

3433 

1.35 

.86 

.30 

3761 

.84 

1.27 

.32 

3412 

1.12 

.19 

.35 

3767 

1.02 

-.04 

.30 

3462 

1.31 

-1.03 

.17 

3930 

1.21 

-.44 

.35 

3285 

.79 

-.60 

.11 

3904 

2.45 

1.58 

.28 

3294 

.76 

-.68 

.19 

3918 

.66 

.35 

.23 

3041 

1.51 

.23 

.35 

3903 

1.21 

-.43 

.31 

3091 

1.64 

.58 

.30 

3928 

1.00 

.65 

.35 

3089 

.92 

-.37 

.30 

3929 

.96 

-1.76 

.22 

3093 

.75 

-.94 

.11 

3813 

1.20 

-.97 

.17 

3096 

1.48 

-1.48 

.16 

3927 

1.01 

-1.34 

.16 

3086 

.74 

-.67 

.35 

Mean 

1.09 

.08 

.25 

Mean 

1.14 

-.06 

.25 

Table  E (continued'. 

Item  Discrimination  ( ) mtl  Difficulty  (/  ) Parameter  Estimates  for  Vocabulary  Items  by  Stratum  and  Midquarter  Subpool 

((»=.20  for  All  Items) 
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